README FILE

This document illustrates the content of the replication package. The Replication folder is made of five subfolders: do, containing all the Stata programs; matlab, containing all the Matlab programs; data, containing the data files; figures, containing all Figures created by the programs, excel, contain all the estimations computed by the Stata codes.


Data
This section describes all the data used in the paper and their source. All the datasets with the selected variables are available in files of .dta format.

-	Census of 1980, 1991, 2000 
The Brazilian Census are available at the IPUMS International data (King et al, 2019) and they can be downloaded at https://international.ipums.org/international/, or they can be purchased at https://loja.ibge.gov.br/catalogsearch/result/?q=censo. Then we used the Data Zoom package to standardize information over the years of Census. Data Zoom offers an option to manipulate the variables subject to compatibility in order to standardize information over time. The document entitled Making Census Compatible explains all the procedures adopted in the process.
The database for each year with selected variables are available at the data subfolder in the .dta format.

-	Favela Census from Rio de Janeiro
This Census was conducted by the state government of Rio de Janeiro in 2010, and the data are available in the pdf files: FavelaAlemao, FavelaManguinhos, FavelaRocinha. We use the information available at Table 31 and 42 and the manipulation of the data can be found at the file Table4.xlsx.

-	PNAD
The Household Survey, Pesquisa Nacional por Amostra de Domicílio (PNAD,) provides data on the intergenerational transitions of education levels for households. Conducted annually since 1976, for 1988 the PNAD includes a special supplement with information about the education levels of the parents of the household, both head and spouse. The PNAD of 1988 can be purchased at https://loja.ibge.gov.br/catalogsearch/result/?q=pnad.
The database for 1988 with selected variables are available at the data subfolder in the .dta format.


Stata Code (Do Folder)
The Stata codes replicate all the Stata Tables and Figures in the paper and appendix. You can find the description of each do file below.

-	Table1.do: Manipulate the 2000 Census and compute the population distributions by occupation, by sector and by years of education.

-	Table 2.do: Manipulate the 1991 Census and compute the average rents for each location (rural and urban for Brazil; and rural, slum and city for Rio de Janeiro and São Paulo). The average rents are then copied and pasted into the Excel file Table2_results.xlsx and the ratios are computed.

-	Table3.do: Manipulate the 1991 and 2000 Census to compute the Mincer regressions presented in Table 3 of the paper.

-	Table5.do: Manipulate the 1991 Census to compute the Probit regressions presented in Table 5 of the paper.

-	Figure1.do: Manipulate the 1991 Census and compute the average of school attendance and school lag (years behind in school) of children who are 7 to 14 years old, and these statistics are by location and parent education. The computed numbers are used in the MATLAB Figure1.m to produce Figure 1 of the paper.

-	Figure1.do: Manipulate the 1991 Census and compute the average of school attendance and school lag (years behind in school) of children who are 7 to 14 years old, and these statistics are by location and parent education. The computed numbers are used in the MATLAB code Figure1.m to produce Figure 1 of the paper.

-	Figure2.do: Manipulate the 1988 Household Survey (PNAD) and compute the education transition matrices by location (rural, RJ Poorer, RJ Richer). The computed numbers are used in the MATLAB Figure2.m to produce Figure 2 of the paper.

-	TableC2.do: Manipulate the 1988 Household Survey (PNAD) and compute the education transition matrices by location (rural, poorer (slum), richer (city)) for each state.

-	FigureC1.do: Manipulate the 1980 Census to generate the income distribution by occupation (routine and cognitive) and by education level. The estimations are pasted in the Excel file Inc_educ_routine_cognitive_5.xlsx and they are used in the Matlab code FigureC1.m to produce Figure C1 of the paper.


Matlab Code (Matlab Folder)
In the Matlab Folder we can find codes to produce figures, calibrate the model and compute the counterfactual exercises.
-	Figure1.m: Produces Figure 1 of the paper and uses the estimations of school attendance and school lag computed in the Stata code Figure1.do.

-	Figure2.m: Produces Figure 2 of the paper and uses the estimations of transition probability matrices of education computed in the Stata code Figure2.do.

-	Figure3.m: Produces Figure 3 of the paper and uses the education distribution of the children (the next generation) computed by the equilibrium of the model for year of 1980 and compare to the data moment of 2010.

-	Figure4.m: Produces Figure 4 of the paper and uses the education distribution in 2040 (the next generation) computed by the equilibrium of the model for the benchmark economy and for 3 counterfactuals: rural with city schools quality, slums with city schools quality, bussing slum-children whose parents have at least 1 year of education.

-	FigureC1.m: Produces Figure C1 of the paper and uses the 1980 earnings distribution by education and occupation computed in the Stata code FigureC1.do (the earnings’ estimated distribution is pasted in the Excel file Inc_educ_routine_cognitive_5.xlsx).

-	applyhatch.m and makehatch.m: Codes created by Ben Hinkle (2024) to apply hatch patterns to the figures produced by the above Matlab codes.

-	TableC1_Parameters_LaborMarketSkills.m: Code to calibrate the parameters mu, sigma and rho of the income distributions by educational levels and occupations. Uses the function Royparams.m. See the estimated parameters in Table C1 of the paper.

-	TableC2_Parameters_EducationTransitions.m: Code to estimate the parameters alphas and betas of the education transition probabilities. Uses the function EstimatingTransitionParametersFunctionRFC..m and the statistics of the transition probabily matrices and average schooling for each location (rural, slum and city) for the 27 states (see the Excel file TransitionMatrixRFC.xlsx). See the estimated parameters in Table C2 of the paper.

-	Main_Code_Calibration_1980.m & Main_Code_Calibration_2010.m: Codes used to internally calibrate the parameters in the Table 6 of the paper (auxiliary functions used: calibration1980.m and calibration2010.m).

-	Main_Counterfactuals_1980.m & Main_Counterfactuals_2010.m: Codes used to compute the following counterfactual exercises: 
o	City-schools in rural areas;
o	City-schools in slums;
o	 Random assignment, i.e., for a probability say theta, 1-theta of the slum kids stay in the slum and theta are added to the school;
o	Selective assignment (9+), i.e. the probabilities are e-dependent, that is the probability of being sent to the C-schools are increasing in the education of the parents; 
o	Selective assignment (1+), i.e. the probabilities are e-dependent, that is the probability of being sent to the C-schools are increasing in the education of the parents;
o	Eliminate slums (tau_f_income=0.99);
o	Facilitate slums (tau_f_income=0.0).

-	Main_poorneighborhood.m: Code used to compute the benchmark economy of an extension of the model where the distant poor neighborhood is added as a fourth location possibility/choice. To compute the counterfactual where the transportation cost increases to 0.7, you just need to change tau_p to 0.7.

-	UrbanEquilibriumFunction.m, UrbanEquilibriumFunction_poorneigh.m, RuralEquilibriumFunction.m, EquilibriumFunction.m, EquilibriumFunction_RandomAssign_Counterfactual.m, EquilibriumFunction_SelectiveAssign_Counterfactual.m: 
auxiliary functions to compute the equilibrium.


Excel Files (Excel Folder)

-	Table1_results.xlsx and Table1_results.xlsx: These files contain statistics computed by the Stata code Table1.do and Table2.do.

-	Table4.xlsx: This file contain the data from Favela Census (Tables 32 and 41 from the pdf files).

-	Inc_educ_routine_cognitive_5.xlsx: It contains the earnings distribution estimated in the code FigureC1.do.

-	TransitionMatrixRFC.xlsx: It contains the statistics of the transition probability matrices and average schooling for each location (rural, slum and city) for the 27 states


Computational Requirements
Running the codes requires the STATA and MATLAB softwares.

The Matlab and Stata codes were run on a MacBook Pro with Intel core i7 and 16 GB memory. 


References

Ben Hinkle (2024). Hatched Fill Patterns (https://www.mathworks.com/matlabcentral/fileexchange/1736-hatched-fill-patterns), MATLAB Central File Exchange. Retrieved August 22, 2024.





